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Abstract. We give a complete characterization of the behavior of the Anderson acceleration 
(with arbitrary nonzero mixing parameters) on linear problems. Let v be the grade of the residual at 
the starting point with respect to the matrix defining the linear problem. We show that if Anderson 
acceleration does not stagnate (that is, produces different iterates) up to v, then the sequence of its 
iterates converges to the exact solution of the linear problem. Otherwise, the Anderson acceleration 
converges to the wrong solution. Anderson acceleration and of GMRES are essentially equivalent up 
to the index where the iterates of Anderson acceleration begin to stagnate. This result holds also for 
an optimized version of Anderson acceleration, where at each step the mixing parameter is chosen 
so that it minimizes the residual of the current iterate. 
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1. Introduction. The Anderson acceleration, or Anderson mixing, was initially 
developed in 1965 by Donald Anderson [1] as an iterative procedure for solving some 
nonlinear integral equations arising in physics. It turns out that the Anderson ac- 
celeration is very efficient for solving other types of nonlinear equations as well, see 
[5] , [5] , and the literature cited therein. In [5] it was shown that on fixed point linear 
problems the Anderson acceleration, with all mixing parameters equal to 1, and 
GMRES are "essentially equivalent" . In the present paper we extend the results of [5] 
for general linear problems and general nonzero mixing parameters. By introducing 
the notion of index of the Anderson acceleration, ka, we manage to give a complete 
characterization of the behavior of the Anderson acceleration with infinite history on 
linear problems. We show that the index of the Anderson acceleration is the same for 
any choice of nonzero mixing parameters, and that it can be defined in terms of the 
stagnation index of the GMRES method. The main result of the paper shows that 
if the index of the Anderson acceleration coincides with the grade of the residual at 
the starting point with respect to the matrix defining the linear problem, v(A, ro) [6j 
pp 37], then the Anderson acceleration converges to the exact solution of the linear 
problem in either v(A, ro) or v{A,ro) + 1 steps. If ka < v(A,ro), then the Ander- 
son acceleration converges to the wrong solution. We also investigate the optimal 
Anderson acceleration, where at each step the mixing parameter is chosen so that it 
minimizes the residual of the current iterate. We show that the performance of the 
optimal Anderson acceleration is not essentially better than the performance of the 
Anderson acceleration with arbitrary nonzero mixing parameters. 

2. Simple Mixing. Consider the linear equation 
(2.1) Ax + b = 

where A is a nonsingular N x N matrix and & is a given N vector. We wish to 
solve (|2.1|) with various iterative methods that produce sequences where the 
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superscript [M] indicates the method that is being used. Let x* = —A 1 b be the 
exact solution of this problem. Since the exact solution is usually not known, the 
errors — x* are difficult to estimate, so that the performance of the method is 
assessed by analyzing the residuals Axlf 1 ^ + b. All methods use the same starting 
point x Q M ^ = xq at which the residual is rg — Axq + b. 

Consider now the simple fixed point iteration 

(2.2) x n +i = x n + Ax n + b = Mx n + b 

where M = I + A. Of course this scheme need not converge, and an improvement 
consists in iterating 

(2.3) xf l+1 = (1 - p)x s n + f3M (x% + b)=x s n +p (Ax s n + b) 

where j3 is a suitably chosen parameter. The method averages or "mixes" the previous 
iterate and the new fixed point iterate. We shall call this iteration simple mixing and 
indicate it with the superscript S. 

3. GMRES and Anderson mixing (Anderson acceleration). The GMRES 
method for the equation Ax + b = determines 

(3.1) x° = x +z n , where, z n = argmin zeK :„{||^(:Eo + z) + b\\ : z e K n }. 
Here K n is the Krylov space 

(3.2) Kn = K n (A, r ) = Span {r , Ar ,..., A^ro} . 

Note that since r = Ax + b, xf is given by xf = xo + f3*(Axo + b), where 

r%Ar 



(3.3) P* = argmin/j \\r + /3Ar \ 



ll^oll 2 ' 



that is, the first step of GMRES is a simple mixing step with a mixing parameter (3 
that minimizes the residual of the result. 

Let K n denote the projection onto the subspace AK n of K . From (|3.1|) it follows 

that 

(3.4) A{x% - x ) = Az n = -K n (Ax + b) = -K n r . 
Therefore we can write 

(3.5) x°-x*= x - x* - A- l K n r Q = A~ l {I - K n )r , 
which is equivalent to 

(3.6) Ax° + b = (/ - K n )ro . 
We also note that 

(3.7) K n = K n+1 «■ (I - K n )A n+1 r = , 

(3.8) x% = xf l+1 & K n r Q = K n+1 r &rg(I- K n )A n+1 r = . 
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Following Wilkinson [5J pp 37], the grade of r ^ with respect to A is defined as 

(3.9) u(A,ro) = 1 + niax{n : dimK, n = n) 

Thus v(A, ro) is the smallest integer n for which there is a non-zero polynomial p(z) 
of degree n such that p(A)r = 0, i.e., 

(3.10) v{A, ro) = min{n G N : ro, Ar^, A™ ro are linearly dependent }. 

Then clearly v(A,ra) iV, and p{z) divides the minimal polynomial of A. Also, 
fC n = fcu(A,r Q ), Mn^v = v(A,r ). 

Proposition 3.1. The GMRES method VS. 1\) converges in exactly v(A, r ) steps, 

i.e., 

^ x* , for n < v(A, ro), and x^ — x* , for n ^ v(A, ro) . 

Proof. If n = v(A, ro), then from (I3.10l it follows that there are numbers £q, . . . , £ n 
such that J2^ =0 ^iA l r = 0. It is easily seen that we must have £o and £ n ^ 0, 
because otherwise the minimality of n is contradicted. Since £o ^ 0, we deduce that 
ro G AJC n . According to (|3.5p we have therefore x^ = x* . If n < v(A, ro), then the 
vectors ro, ^4ro, . . . , A n Tq are linearly independent, so that ro (f. AK. n , which means 
that x% ^ x*. □ 

We next summarize Anderson mixing, for the nonlinear equation f(x) = 0. 

Anderson mixing. Given a nonlinear operator / on R N , an initial point x G R N , a 
sequence /3o, Pi, ■ ■ ■ in R \ {0}, and an integer m: 

0. Set 

fo = f(x ), xi=x o +0ofo- 

1. For ft = 1,2,..., set 

m k = min{m, k}, r k =k-m k , f k = f(x k ), 

{a 0> k,...,a mk , k ) = argmin (QOi ... :Qmfc) {|| ^ ai / rfc+l || 2 : ^^ = 1}, 

i=0 i=a 

x k+i = 22 a l . k x rk+i + (3 k ^2a^ k fr k+l = a itk (x rk+i + j3 k f rk+i ) . 

i=0 i=0 i=0 

For fii = 1 and f{x) = g(x) — x, this algorithm reduces to Algorithm AA from 
[5] . The version given here was proposed in pQ . There is also an equivalent (in exact 
arithmetic) version of this algorithm in terms of difference vectors x k — x k -\ and 
f{%k) — f(xk-i)- It is presented in [2] in order to reveal its connection to multisecant 
and Broyden type methods for solving nonlinear operator equations, but it obscures 
the "mixing" idea. We therefore do not give that version here. 

If we solve the constrained optimization problem from this algorithm with the 
substitution method, e.g. by setting ao = 1 — X)I=\ a i an d solving the corresponding 
unconstrained optimization problem for ax, ... , a mk , 

(3.11) min \\f rk + Ctl(fr k +1 - fr k ) + a 2 {fr k +2 - fr k ) + ■ ■ . + a mk {fk - /rj|| , 
a u ...,a mk 
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then the differencing may iead to problems with loss of significance. However, the 
authors of [5] claim that implementing the Anderson acceleration with a substitution 
method "offers several advantages and, in our experience, no evident disadvantages" . 
Other methods for solving the constrained optimization problem are discussed in [4] . 

4. Convergence analysis in the linear case. In what follows we will investi- 
gate the relationship between the Anderson mixing and GMRES, for linear systems 
of the form 12.11 We use the notation x^ for the sequence generated by the GMRES 
method as described in the previous section, and x^ for the sequence generated by 
Anderson mixing with m = oo and f(x) = Ax + 6, i.e, 

n n 

(4-1) X* +1 = a i,nXf + PnJ2 ^A A xf + h ) = + Pn{Ax* +1 + & )> 

i=0 i=0 

n n 

Kn,---,M = aigmiri( a0) ... iQ , n ){||& + A^otixf \\ : ^ on = 1}. 

■i=0 i=0 

Here 

n 

(4.2) x£ +1 =J2 a i,nxf 

i=0 

may be viewed as a prediction of the next iterate, a linear combination of all previous 
iterates. Using Yn=o = 1, we deduce that 

k n n 

(4.3) X^ +1 = (1 - ^ a i,n)xo + ^ a i,nxf = X + ^ <^i,n{xf ~ X°). 

i — 1 l—l i—1 

Therefore f|4.ip can be written as 

n n 

(4.4) x£ +1 = x + ^2 ®i,n{xt ~ Xo) + Pn{b + A(x Q + ^2 Oii, n (xf - x )), 

i=l i=l 

n 

(4.5) (ai, n , ■ • - ,Q!n,Ti) = argmin( ai) ... )Q , n )||6 + A(x + ^a^xf - x°))\\ . 

i=l 

Let us consider the linear subspace 

(4.6) £„ = Span{xf - x , ■ . • , x% - x } , 

and denote by L n the projection onto AC n . With this notation we have 

(4.7) ' x o) = -L n (Ax + b) = -L n r , 
which is equivalent to 

(4.8) x£ +1 =x - A^ 1 L n r = x* + A~ x r - A _1 L n r = x* + A- 1 (I - L n )r . 
Therefore, 

(4.9) Ax*+i +b=(I- L n )r , 

(4.10) xi +1 -x* = (I + MA-^I-L^ro = (I + n A)(x£ +1 - x*) , 

(4.11) Axi +l +b=(I + p n A){I - L n )r = (I + f3 n A)(Ax* +1 + b) , 

(4.12) A(x£ +1 - x ) = -L n r + (3 n A{I - L n )r = p n Ar - L n r - (3 n AL n r . 
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Definition 4.1. The index of the Anderson acceleration is defined as 

(4.13) ka — min{n £ N : x^ — xq, x^ — Xq, . . . , x^ l+1 — xq are linearly dependent} . 
The stagnation index of the GMRES method \3.1)) is defined as 

(4.14) if = min{n £ {0, 1, . . .} : a£ = xf l+1 }. 

The above notion allows for a complete description of the convergence properties of 
the Anderson acceleration for linear problems, for arbitrary sequences of relaxation 
parameters f3 n . 

Proposition 4.2. The index ka of the Anderson acceleration is always less than 
or equal than the grade v(A, To), and the sequences generated by the two methods sat- 
isfy the following properties: 

a ) X n+1 = X n + Pn(Ax% +6), n = 0,l,...,K A ! 

b ) x n+l = x n> n = 0,l,...,K A . 

Proof. Let us first prove by induction that C n C /C n . Since x^ — xq = Par® this 
is readily verified for n = 1. Assume that C n C K n . Then from (|4.4|) it follows that 

n 

(4.15) x£ +1 -x = (3 n r + {I + f3 n A) ^ ®i,n{xf - x )) £ K n + AK, n = IC n+1 , 

i=l 

which completes the induction step. Since C n £ K n , the linear independence of 
x^ — xq, ■ ■ ■ , x^ — xq always implies the linear independence of ro, . . . , A n ~ 1 ro. This 
implies that ka ^ v(A,ro). It also implies that 

(4.16) £„ = K, n , and dim{K, n ) — n, for n = 1, . . . , ka ■ 

Point a) of our proposition follows from (|3.5[) and (|4.10j) . while point b) follows 
from $M>$) and @~H1). □ 

The result shows that varying the sequence of relaxation parameters (3k does not 
change the behavior of the sequence x^ in a significant way for linear problems, if 
exact arithmetic is used and as long as k ^ to. However, for nonlinear problems, the 
convergence behavior can be quite sensitive to the "right" choice of the [3k] see e.g. 
[3]. Also, the choice of the (3 n does matter for large n if to is finite, as is shown by 
numerical experience. 

It is instructive to observe what exactly happens at the step ka- Of course, if 
ka = v(A,ro), the Anderson mixing has converged to the correct solution. In the 
case where ka < f(A,ro), we have the following characterization. 

Proposition 4.3. If n < ka < is(A,r ), then 

a) a n>n jLO in (Jlfy , 

b) \\Ax%_ 1+ b\\>\\Ax% + b\\ . 
If n = ka < v{A, ro), then 
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c) a n . n = in fpfy , 

d) (Ax^+b) T A(xf l+1 -x )=0 ! 

\ G — A — A G —A 

e J X n—1 — x n = x n+l = x n = x n-\-2 t 

f) \\A X °_ 1 + b\\ = \\Ax% + b\\ . 
Proof. Using (|4.16p we can write 

n 

(4.17) x£ -Xo=^2&, n A i - 1 r , with£ n , n ^O, n=l,...,K A 

i=l 

for some uniquely determined scalars £i. n , • ■ • , (,n,n- Using (|4.4p we obtain 

n n— 1 

Xn+i - x o = PnTo + ^ a i,n(xf - x Q ) + /3 n A ^ a hn (xf - X ) + p n a n ^ n A{x^ - Xq) 

i=l i=l 
= Vn + /3 n a„,nin.nA n r , with y n G C n — IC n , U = l,...,KA- 

If n < ka, then C n +\ ^ C n , so that we must have a n ^ n 0. This proves part a). On 
the other hand, if n = ka < v(A,ro), then C n +\ — C n — JC n K, n +\. This implies 
&n,n = 0, which shows the validity of part c). If a n . n = 0, then from (|4.4|) and (14.31) we 
have x~n = x^ +1 , which, in view of Proposition ^. 2l implies x^_j = x^ = x„ +1 = x^. 
Since £„+i = C n we also have x^ +1 = x^ +2 , which completes the proof of part e). 
Part f) follows trivially. 

To prove d), observe that the normal equations from (|4.4[) can be written as 
(Ax n+1 + b) T A(xf - x ) — , i — 1, . . . ,n . 

Geometrically, the above relation follows immediately, since according to (14. 9p . 
Ax n+ i + b G (AC n ) . In case n = ka, according to e), we have x n+ i — x n . For i = n 
we get part d). 

To prove part b), note that by (|3.1[) . ||Ax„ + b\\ < HAc^^ + 6||. Suppose there is 
equality, then from (13.61) ||(J — i^„)r || = — K n -i)r \\. Since these are orthogonal 
projections, this implies in turn K n r^ = K n -iro, and therefore by Proposition 14.21 
(|3.5[) . and (|4.8I) . we should have x^ +1 = x% = x^_ x — x~n ■ But this is impossible, 
since x„ — xo G Cn-i, but x^ +1 — xq G C n \ C n -%, due to a„„ ^ 0. Therefore, 
\\Ax% + b\\<\\Ax%_ 1 + b\\.U 

Using propositions 13.11 14.21 and 14.31 we obtain the following theorem, which 
represents the main result of our paper. 

Theorem 4.4. Assume that A is an invertible N x N -matrix. Consider the 
GMRES method \3.1\) . and the Anderson acceleration method f or finding the 

solution x* of the linear equation \2. with arbitrary nonzero mixing parameters 
/3q,(3i,.... For any starting point xq, consider also the grade v(A,rQ), defined in 
i3.10\ ), and the index of the Anderson acceleration ka, defined in \4.13\ ). 

(i) ka = v(A 7 ro) =v if and only if 

(4.18) \\Ax + b\\ >\\Axf + b\\> ■■■> ||Ax°_i + b\\ > \\Ax% + b\\ = . 
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Moreover, in this case the sequence produced by the Anderson acceleration satisfies 



(4.19) 
(4.20) 



b n+l 



x° + /3 n (Ax% + b), if n < v 
x* , if n ^ v 

; , » = 0,1,... . 



(ii) k = k-a < v(A,ro) if and only if 

(4.21) \\Ax + b\\ > \\Axf + b\\ > ••• > \\AxZ_! + b\\ = \\Ax° +b\\>0. 

Moreover, in this case ka = + 1, and the sequence produced by the Anderson 
acceleration satisfies 

(4.22) 

(4.23) 

Proof. 

The fact that k a = u(A,r ) = v implies (|4~T5|) , (gHU), and (|4~2H|) follows from 
propositions 13.11 14.21 and 14.31 

In order to prove that k — ka < v(A,rv) implies (|4.21[) . we first prove by induction 
that C n — C K = 1C K for all n > k. This is certainly true for n — k + 1 from 
Definition 14.11 Suppose that our statement is true for an n > n. Then, according to 
(|3.5|) . (|4.8j> . and point e) of Proposition ^. 31 we have x^ +1 — x^ = x^_ x . This proves 
x^ +1 — x^-i + (3 n {Ax ( ^_ 1 + b). To complete the induction step, we note that from 
(03) anddSH) it follows that 

A{x£ +1 - x Q ) = A(x°_ 1 - a; ) + p n A{Ax°_ x +b) = (3 n Ar - (I + [3 n A)K K -ir . 

Since AK K ^ira € A 2 K K -\ C /C K +i, we deduce that x^ +1 — xq s K. k , which shows that 
C n+ i = K K . The fact k = k a < v{A, r ) implies (|4.22l) and (|4.23j) and that n A = rf 
is this case follows from propositions 14.21 and 14.31 

Assume now that (|4.18p holds, but ka ^ v(A,tq) = v. Since ha < v{A,r ), this 
implies k = ka < v{A,r ), which in turn, as seen above implies (|4.21[> . so that (14. 18)) 
cannot be true. Hence, (14.181) is equivalent to k a — v{A, ro) — v. 

Similarly if k = ka < v(A,tq) is not true, we must have k = ka = v{A, ro), 
which, as seen above, implies ()4.18|) so that f|4.21f) is not true. Therefore (|4.21|) is 
equivalent to k = ka < v(A, ro). □ 

Theorem 14.41 gives a complete characterization of the behavior of the Anderson 
acceleration on linear problems. If ka = v(A,ro), then the Anderson acceleration 
converges to x*. If ka < u(A,ro), then ka is precisely the first index for which 
GMRES stagnates (i.e. produces two identical successive iterates). If this ever hap- 
pens, GMRES continues to generate larger Krylov spaces, and it will eventually con- 
verge to x* , while Anderson mixing will then stagnate forever. An extreme example 
is given by A = Pjy, where Pm is the permutation matrix for the cycle (123 . . . iV), 
with minimal polynomial q(z) = 1 — z . Then if b is any standard basis vector e&, 
Anderson mixing will immediately stagnate (i.e. ka = 1), while GMRES will converge 
in v(A, b) = N steps, stagnating at the initial value until the very last step. 
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f x% + (3 n (Ax% + b), if n < k 

\ x°_ x + [3 n (Ax^_ 1 + b), if n > k 

J ir„, if n < k 

\ if n > k 



Corollary 4.5. The Anderson acceleration for linear problems converges in 
at most ka + 1 steps, but not necessarily to the solution of the linear problem. If 
ha = K^4,fo); the Anderson acceleration converges to the exact solution of the linear 
problem in either v(A,ro) or v(A, ro) + 1 steps. 

Proof. The first part of the Corollary follows directly from Theorem l4.4l We note 
that if n < k a , then x£ +1 - x* = x G - x* + /3 n (Ax G + b) = x G - x* + [3 n A(x G - x*) . 
Therefore x^ +1 — x* if and only if x G — x* is an eigenvector of A with eigenvalue 
—l/fln, but in this case we have || Az^ +1 + 6|| ^ ||Aa;^ +1 + b\\ = 0, so that x G +1 = x* , 
and therefore n + 1 = v(A, ro). □ 

Corollary 4.6. We have r\ G = 0, ha — 1 if and only if the quantity j3* defined 
in A3.3\) vanishes. 

Proof. If j3* = 0, then (as noted in the remarks following (|3.3p ). x G = xq + 
(3*(Ax + b) = xq, and hence r\ G = k a ~ 1 = 0. □ 

5. Anderson acceleration with optimized mixing parameters. The An- 
derson acceleration for the linear problem (|2.1j) defined in (|4.4I) depends on a sequence 
of mixing parameters /3q, f}\, . . .. In this section we consider a variant of the Anderson 
acceleration, where at each step (3 n is chosen so that the residual at x^ +1 is minimal. 
More precisely we consider the following algorithm: 

Optimized Anderson acceleration for linear problems. 
Set x^* = x and xf* = x + /3^(Ax + b), with (3* defined in (j33)l ; 
For n= 1,2,... 
Compute 

n 

{a* l n , . . . ,a* n n ) = axgmin (o , u _ tan )\\b + A(x + ^2 cti(xf* - x°))\\ ; 

i=l 

Compute 

n 

-A* i \ ^ * / A* 0\ 

x n+1 =x Q + 2^a lin (x l -x u ) 

i=l 

If Ax^l 1 +6 = 0, set (3* n = 0. Otherwise set 

fl . (Ax^ 1+b ) T A(Ax^ +1 +b) 
\\A{Ax£l 1 + bW 
Set x^+x — a^+i + PniAx^+i + b). 

We note that Theorem 14.41 implies that ka is the same for any choice of nonzero 
mixing parameters po,/3i,... and that the sequence x^ is independent of this choice. 
Also, once /3* = 0, then clearly f3* — for all r > n. Therefore, if /3* ^ for some 
n ^ 0, then 

fir 7^ 0? Xr+1 ~ %r+l 

for all r ^ n. However, it seems possible that the /3* become zero for some n < 
ka and that optimized Anderson acceleration stagnates before a general Anderson 
acceleration scheme stagnates (in which f3 n ^ for all n is enforced). We now show 
that this can never happen and that /3* becomes zero precise for n > ka- 

Theorem 5.1. Assume that A is an invertible N x N -matrix. Consider the 
GMRES method h3.1\) . the Anderson acceleration method \4-4\ ), with arbitrary nonzero 
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mixing parameters /3q, Pi , • ■ ■> and the optimized Anderson acceleration for linear prob- 
lems described above. Then for all n, f3* =t if and only if n ^ — r/ G + 1. Also 
for all n, 



(5.1) 



a* / x G +0*(Ax G + b), ifn<7y G 



"■+ 1 1 x G G , if n G 



(5.3) ||Ar G +1 + b|| < || Ae^! + 6|| < \\Ax G + b\\, if n < rf 



L n+1 

„A* i i||2 II a=lA* i i II 2 o*2|| 4/ /i-A* 



(5.4) \\AxT +l +br= \\AxT +l +br-^\\A(Axi 

L n+1 



(5.5) ^\\Ax^ + b\\ 2 -f3: 2 \\A(Ax^ +1 +b)\ 



Proof. We begin by observing that (|5.4j) and f|5 . 5[) follow from the construction 
of the x A * for all n. Let 

(5.6) i? = max{n | 0* ^ 0} . 

Then /3* ^ for all n < R, since otherwise cc^* = a^+x f° r some n < R and hence 
/3£ = for all k ^ n. Then an induction argument and the results in Theorem 14.41 
show that (|5.ip - (|5 . 3[) hold for n ^ R. Clearly, R ^ ka- It remains to show that 

R ^ KA- 

Let therefore n = R + 1, and assume that n ^ k^. We may also assume that 
n ^ v(A,ro), since otherwise there is nothing to prove. We want to show that 



(5.7) 



which will imply the direct contradiction n = R + l > ka- Now the equation Ax^ +1 — 
Axq = —L n ro (see (I4.7[l ). where L n is the orthogonal projection on AC n = AC n , is 
equivalent to 

(5.8) £ T (A^ +1 +6)=0 

for all £ £ AC n . Showing f|5 . T|) is now equivalent to proving (|5.8|) for all £ £ AC n +x, 
since then Axf L+1 — Axo = —L n+ iro = Ax^ +2 — Axq and hence (|5.7p holds. To show 
(|5.8p for all £ 6 AC n+ x, it suffices to show (|5.8[) for a single £ £ AC n+ \ \ AC n . Now 
use the assumption /3* = 0, which is equivalent to 

(A(A^ +1 +6)) T (^ +1 +&)=0. 

We wish to show that 

(5.9) £ = (A(Ax* +1 + b)) e AC n+1 - AC n 



which will prove (|5.8j) for all £ £ AC n+ i and complete the proof of the theorem. 
We know that — xo £ £„. But — .To ^ £ n -i = £r, since otherwise 

-4i^+i - Ax = L„_i (Ax^ +1 - Ax ) = -L n -iL n r = -£„_ir 
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and therefore already x^ = xj^ , l5 contradicting R < ka- Therefore we can write 

n-l 

x n+l ~ x o = ^ ^iA 3 ro G £ n 

1=0 

and A„_i ^ 0. Consequently 

n-l 

Axf l+1 +b = r + ^2 *jA j+1 ro G C n+1 
3=0 

But if Ax£+i + b E C n , then a calculation shows that Ax^ +1 — Axq £ AL n -\ and 
hence x^ +1 — xq G £„_x, which was ruled out above. Consequently 

A5 n+1 + 6 G £„+i \ £„ . 

This implies (I5.9P and therefore (|5 . 8[) for all £ G AC n+1 . Then x^ +2 = x£+±, and all 
conclusions of the theorem follow. □ 

The theorem shows that optimized Anderson acceleration enjoys the descent prop- 
erty 

\\AxA* +1 + b\\<\\Ax£ 1 + b\\^\\Ax£* + b\\ 

for all n < ka, but that it does not accelerate convergence otherwise when compared 
to Anderson acceleration with arbitrary j3 n =/= 0. 
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